Dynamo AI Resource Provisioning Guidelines
This reference outlines recommendations on resource provisioning for the DynamoAI platform based on expected feature utilization and workloads. It ensures optimal performance across various use cases and scenarios.
Scaling Considerations
Dynamo AI platform resource recommendations are based on the following metrics:
- Throughput: Number of requests per second
- Guardrails: Number of guardrails applied per moderation request (DynamoGuard)
Throughput Scenarios
Below, we provide resource requirements for different throughput scenarios, ranging from < 1 QPS to 100 QPS. For context, we typically observe production workloads ranging from 0.1 - 10 QPS in our customer's AI use cases, however Dynamo AI can support peak workloads exceeding 250 QPS.
Example: For an AI use case with 100k global users, a throughput of 10 QPS equates to approximately 8-12 queries per user per day.
Scenario | Expected Throughput (QPS) | Use Cases |
---|---|---|
Development | 1 QPS | Testing environments |
Small | 5 QPS | Lightweight production scenarios |
Medium | 10 QPS | Moderate-scale AI application |
Large | 50 QPS | High-demand production applications |
Extra Large | 100 QPS | Enterprise-scale, high-performance systems |
Resource Guidelines
General Platform
Base platform resources are used for API and UI servers. The table below outlines recommended configurations:
Scenario | Recommended Resources | Example Cloud-Specific Details |
---|---|---|
Development | x32 vCPUs, 64GB memory | - AWS: x8 c7i.xlarge - Azure: x8 Standard_F4s_v2 - GCP: x8 c2d-standard-4 |
Small | x32 vCPUs, 64GB memory | - AWS: x8 c7i.xlarge - Azure: x8 Standard_F4s_v2 - GCP: x8 c2d-standard-4 |
Medium | x64 vCPUs, 128GB memory | - AWS: x16 c7i.xlarge - Azure: x16 Standard_F4s_v2 - GCP: x16 c2d-standard-4 |
Large | x128 vCPUs, 256GB memory | - AWS: x32 c7i.xlarge - Azure: x32 Standard_F4s_v2 - GCP: x32 c2d-standard-4 |
Extra Large | x128 vCPUs, 256GB memory | - AWS: x32 c7i.xlarge - Azure: x32 Standard_F4s_v2 - GCP: x32 c2d-standard-4 |
Note: This table is a general reference. You may need less resources than described in this table, because general platform components can run on the GPU nodes, using the shared vCPU and RAM. But provisioning this amount of resources guarantees the performance.
DynamoGuard Content Guardrails
DynamoGuard requires resources based on the number of guardrails applied to a workload. While CPUs can handle limited guardrails with higher latency, GPUs offer significantly improved latency (< 300ms). For lower latency when using CPUs, we recommend utilizing compute-optimized instances. Below, we provide the resource requirements for input content guardrails. For details around output content guardrails, please reach out to our team.
Tip: For non-development workloads, we recommend GPUs due to reduced latency and higher scalability.
Note: You should to calculate the number of GPUs you need based on your scenario and the number of policies you will deploy in the cluster.
Scenario | CPU Option | GPU Option | Example Cloud-Specific Instances |
---|---|---|---|
Development | x8 vCPUs, 8GB memory per guardrail | Same as Small scenario | - AWS: c7i.xlarge - Azure: Standard_F4s_v2 - GCP: c2d-standard-4 |
Small | Not Recommended | 1 A10G GPU per 10 guardrails | - AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8 |
Medium | Not Recommended | 1 A10G GPU per 6 guardrails | - AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8 |
Large | Not Recommended | 1 A10G GPU per guardrail | - AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8 |
Extra Large | Not Recommended | 2 A10G GPUs per guardrail | - AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8 |
Data Generation
Data generation is a required step in the custom content policy creation. Dynamo AI supports several external and in-cluster model configurations for data generation. If required, contact Dynamo support for additional model providers.
Option | Description | Cloud-Specific Details |
---|---|---|
Option 1 | Azure Llama 3.1-8B: azure_ai/Meta-Llama-3-1-8B-Instruct | N/A |
Option 2 | AWS Llama 3.1-8B: bedrock/llama/us.meta.llama3-1-8b-instruct-v1:0 | N/A |
Option 3 | GCP Llama 3.1-8B: llama-3.1-8b-instruct-maas | N/A |
Option 4 (less performant) | In-cluster model: x1 A10G GPU + x8 vCPUs | - AWS: x1 g5.2xlarge - Azure: x1 NV36ads_A10_v5 - GCP: x1 g2-standard-8 |
Guardrail Fine-Tuning
Guardrail fine-tuning is a required step in the custom content policy creation. DynamoGuard offers two options for fine-tuning guardrails. Choose between SaaS fine-tuning or in-cluster fine-tuning based on your infrastructure.
Option | Description | Cloud-Specific Details |
---|---|---|
Option 1 | Fine-tune on Dynamo SaaS environment and import policies into your cluster | N/A |
Option 2 | x1 A10G (or similar) GPU with x8 vCPUs, 32 GB RAM, and 24GB GPU memory | - AWS: x1 g5.2xlarge - Azure: x1 NV36ads_A10_v5 - GCP: x1 g2-standard-8 |
DynamoGuard Hallucination Guardrails
For hallucination guardrails, Dynamo supports both external and in-cluster configurations. If required, contact Dynamo support for additional model providers.
Option | Description | Cloud-Specific Details |
---|---|---|
Option 1 | Azure-Open-AI-GPT-4o or Open-AI-GPT-4o | N/A |
Option 2 | In-cluster: Requires x3 A10G GPUs at 1 QPS. Additional GPUs scale linearly. | - AWS: x3 g5.2xlarge - Azure: x3 NV36ads_A10_v5 - GCP: x3 g2-standard-8 |
DynamoEval
DynamoEval requires the following resource configurations. The API endpoints are used for the data generation and judgement.
Requirement | Description | Cloud-Specific Details |
---|---|---|
API Endpoints | mistral-small-latest , open-mistral-nemo , Open-AI-GPT-4o | N/A |
CPU and memory | x1 vCPU, 4GB memory | - AWS: x1 c7i.xlarge - Azure: x1 Standard_F4s_v2 - GCP: x1 c2d-standard-4 |